Unsupervised Resource Creation for Textual Inference Applications

نویسندگان

  • Jeremy Bensley
  • Andrew Hickl
چکیده

This paper explores how a battery of unsupervised techniques can be used in order to create large, high-quality corpora for textual inference applications, such as systems for recognizing textual entailment (TE) and textual contradiction (TC). We show that it is possible to automatically generate sets of positive and negative instances of textual entailment and contradiction from textual corpora with greater than 90% precision. We describe how we generated more than 1 million TE pairs – and a corresponding set of 500,000 TC pairs – from the documents found in the 2 GB AQUAINT-2 newswire corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Without a "doubt"? Unsupervised Discovery of Downward-Entailing Operators

An important part of textual inference is making deductions involving monotonicity, that is, determining whether a given assertion entails restrictions or relaxations of that assertion. For instance, the statement ‘We know the epidemic spread quickly’ does not entail ‘We know the epidemic spread quickly via fleas’, but ‘We doubt the epidemic spread quickly’ entails ‘We doubt the epidemic spread...

متن کامل

A Test Suite for Inference Involving Adjectives

Recently, most of the research in NLP has concentrated on the creation of applications coping with textual entailment. However, there still exist very few resources for the evaluation of such applications. We argue that the reason for this resides not only in the novelty of the research field but also and mainly in the difficulty of defining the linguistic phenomena which are responsible for in...

متن کامل

Knowledge-Based Textual Inference via Parse-Tree Transformations

Textual inference is an important component in many applications for understanding natural language. Classical approaches to textual inference rely on logical representations for meaning, which may be regarded as “external” to the natural language itself. However, practical applications usually adopt shallower lexical or lexical-syntactic representations, which correspond closely to language st...

متن کامل

Annotating Lexically Entailed Subevents for Textual Inference Tasks

This paper presents a procedure for constructing an Event Structure Lexicon (ESL), a resource which represents the lexically-entailed subevents in text as a support for textual inference tasks. The ESL is used as a resource for a subevent markup algorithm, called SUBEVITA, which annotates event implicatures on top of TimeML-based extraction algorithms. Such a resource can be used independently ...

متن کامل

LEDIR: An Unsupervised Algorithm for Learning Directionality of Inference Rules

Semantic inference is a core component of many natural language applications. In response, several researchers have developed algorithms for automatically learning inference rules from textual corpora. However, these rules are often either imprecise or underspecified in directionality. In this paper we propose an algorithm called LEDIR that filters incorrect inference rules and identifies the d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008